Picture for Shihan Dou

Shihan Dou

PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models

Add code
May 20, 2026
Viaarxiv icon

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

Add code
May 19, 2026
Viaarxiv icon

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

Add code
May 14, 2026
Viaarxiv icon

Agentic Harness Engineering: Observability-Driven Automatic Evolution of Coding-Agent Harnesses

Add code
Apr 28, 2026
Viaarxiv icon

EVPO: Explained Variance Policy Optimization for Adaptive Critic Utilization in LLM Post-Training

Add code
Apr 21, 2026
Viaarxiv icon

Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

Add code
Apr 15, 2026
Viaarxiv icon

Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges

Add code
Apr 15, 2026
Viaarxiv icon

MM-Doc-R1: Training Agents for Long Document Visual Question Answering through Multi-turn Reinforcement Learning

Add code
Apr 15, 2026
Viaarxiv icon

A Decomposition Perspective to Long-context Reasoning for LLMs

Add code
Apr 09, 2026
Viaarxiv icon

JFTA-Bench: Evaluate LLM's Ability of Tracking and Analyzing Malfunctions Using Fault Trees

Add code
Mar 24, 2026
Viaarxiv icon